Parquet add no convert marker#7625
Conversation
| if cortex_parquet.ValidNoConvertMarkVersion(noConvertMark.Version) { | ||
| level.Debug(logger).Log("msg", "skipping block, no-convert marker already exists", "block", b.ULID.String()) | ||
| c.metrics.skippedBlocks.WithLabelValues(userID, cortex_parquet.NoConvertReasonMarkerExists).Inc() | ||
| continue |
There was a problem hiding this comment.
- If noConvertMark is upgraded (v1 -> v2), is it correct that the noConvertMark file is overwritten to v2?
- When the user increases this limit (1000 -> 2000), I think we should convert the previous unconverted blocks. (ex. 1500)
ㄴ We can track the current applied limit in noConvertMark and utilize it.
There was a problem hiding this comment.
There is only v1 version right now. I'm not quite sure in which cases the noConvertMark would be upgraded.
My implementation is based on comparing the current limit and label count. By tracking the applied limit, do you mean comparing the limits ?
Like, if current limit > old noConvertMark limit -> Retry conversion
There was a problem hiding this comment.
When the configured limit changes, I think we should re-evaluate the block against the new limit, instead of unconditionally skipping it just because a noConvertMark already exists.
| if cortex_parquet.ValidNoConvertMarkVersion(noConvertMark.Version) && noConvertMark.ShouldSkipBlock(maxBlockLabelNames) { | ||
| level.Debug(logger).Log("msg", "skipping block, no-convert marker already exists", "block", b.ULID.String()) | ||
| c.metrics.skippedBlocks.WithLabelValues(userID, cortex_parquet.NoConvertReasonMarkerExists).Inc() | ||
| continue |
There was a problem hiding this comment.
wdyt?
| if cortex_parquet.ValidNoConvertMarkVersion(noConvertMark.Version) && noConvertMark.ShouldSkipBlock(maxBlockLabelNames) { | |
| level.Debug(logger).Log("msg", "skipping block, no-convert marker already exists", "block", b.ULID.String()) | |
| c.metrics.skippedBlocks.WithLabelValues(userID, cortex_parquet.NoConvertReasonMarkerExists).Inc() | |
| continue | |
| if cortex_parquet.ValidNoConvertMarkVersion(noConvertMark.Version) { | |
| level.Debug(logger).Log("msg", "skipping block, no-convert marker already exists", "block", b.ULID.String()) | |
| c.metrics.skippedBlocks.WithLabelValues(userID, cortex_parquet.NoConvertReasonMarkerExists).Inc() | |
| continue | |
| } | |
| if noConvertMark.ShouldSkipBlock(maxBlockLabelNames) { | |
| level.Debug(logger).Log( | |
| "msg", "skipping block because label count still exceeds current limit", | |
| "block", b.ULID.String(), | |
| "label_names_count", noConvertMark.LabelNamesCount, | |
| "current_limit", maxBlockLabelNames, | |
| ) | |
| c.metrics.skippedBlocks.WithLabelValues(userID, cortex_parquet.NoConvertReasonTooManyLabels).Inc() | |
| continue | |
| } | |
| } |
There was a problem hiding this comment.
If we change to this way, it means that if the no converter marker already exists we won't re-evaluate the no convert marker check?
There was a problem hiding this comment.
ah, I mean
if cortex_parquet.ValidNoConvertMarkVersion(noConvertMark.Version) {
if noConvertMark.Reason != cortex_parquet.NoConvertReasonTooManyLabels {
level.Debug(logger).Log(
"msg", "skipping block, no-convert marker already exists",
"block", b.ULID.String(),
)
c.metrics.skippedBlocks.WithLabelValues(userID, cortex_parquet.NoConvertReasonMarkerExists).Inc()
continue
}
if noConvertMark.ShouldSkipBlock(maxBlockLabelNames) {
level.Debug(logger).Log(
"msg", "skipping block because label count still exceeds current limit",
"block", b.ULID.String(),
"label_names_count", noConvertMark.LabelNamesCount,
"current_limit", maxBlockLabelNames,
)
c.metrics.skippedBlocks.WithLabelValues(userID, cortex_parquet.NoConvertReasonTooManyLabels).Inc()
continue
}
}
There was a problem hiding this comment.
Yep, if we change this way, then every no converter marker will be skipped immediately. ShouldSkipBlock(maxBlockLabelNames) is never be checked
There was a problem hiding this comment.
ah, I mean
if cortex_parquet.ValidNoConvertMarkVersion(noConvertMark.Version) { if noConvertMark.Reason != cortex_parquet.NoConvertReasonTooManyLabels { level.Debug(logger).Log( "msg", "skipping block, no-convert marker already exists", "block", b.ULID.String(), ) c.metrics.skippedBlocks.WithLabelValues(userID, cortex_parquet.NoConvertReasonMarkerExists).Inc() continue } if noConvertMark.ShouldSkipBlock(maxBlockLabelNames) { level.Debug(logger).Log( "msg", "skipping block because label count still exceeds current limit", "block", b.ULID.String(), "label_names_count", noConvertMark.LabelNamesCount, "current_limit", maxBlockLabelNames, ) c.metrics.skippedBlocks.WithLabelValues(userID, cortex_parquet.NoConvertReasonTooManyLabels).Inc() continue } }
This makes sense. I made the same check here:
func (m NoConvertMark) ShouldSkipBlock(currentMaxBlockLabelNamesLimit int) bool {
// Manual no-convert marks are not tied to the label-name limit
if m.Reason != NoConvertReasonTooManyLabels {
return true
}
But, I think writing in the way you suggested make it easier to understand on the first read.
Changes: - Add parquet no-convert marker and read/write logic - Add max-block-label-names limit, blocks exceeding it get a no-convert marker instead of being converted. - Add parquet_converter_max_block_label_names to exporter test - Add integration test for parquet no-convert marker Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
The converter only read no-convert markers when the label-name limit was enabled, so manually marked blocks were still converted when the limit was 0. Read the marker unconditionally before conversion so these blocks stay skipped. Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
Retry conversion if the current limit has increased beyond the label count stored in the old no-convert mark. Update converter tests for the new marker fields and retry behavior Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
- Update tests to check skip with lower current limit Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
ee56065 to
6299415
Compare
Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
What this PR does:
If a TSDB block exceeds a configurable threshold of distinct label names, the converter writes a
parquet-no-convert-mark.jsonmarker and skips the block.parquet-converter.max-block-label-nameslimitWhich issue(s) this PR fixes:
Fixes #7195
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]docs/configuration/v1-guarantees.mdupdated if this PR introduces experimental flagsNotes from the Previous PR review
cortex_parquet.ValidConverterMarkVersion